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Abstract — In this paper, we consider modulation codes for 
practical multilevel flash memory storage systems with q cell 
levels. Instead of maximizing the lifetime of the device [7], [1], 
[2], [4], we maximize the average amount of information stored 
per cell-level, which is defined as storage efficiency. Using this 
framework, we show that the worst-case criterion [7], [1], [2] 
and the average-case criterion [4] are two extreme cases of 
our objective function. A self-randomized modulation code is 
proposed which is asymptotically optimal, as g ^ oo, for an 
arbitrary input alphabet and i.i.d. input distribution. 

In practical flash memory systems, the number of cell-levels 
q is only moderately large. So the asymptotic performance as 
q ^ oo may not tell the whole story. Using the tools from 
load-balancing theory, we analyze the storage efficiency of the 
self-randomized modulation code. The result shows that only a 
fraction of the cells are utilized when the number of cell-levels 
q is only moderately large. We also propose a load-balancing 
modulation code, based on a phenomenon known as "the power 
of two random choices" [10], to improve the storage efficiency 
of practical systems. Theoretical analysis and simulation results 
show that our load-balancing modulation codes can provide 
significant gain to practical flash memory storage systems. 
Though pseudo-random, our approach achieves the same load- 
balancing performance, for i.i.d. inputs, as a purely random 
approach based on the power of two random choices. 

I. Introduction 

Information-theoretic research on capacity and coding for 
write-limited memory originates in [12], [3], [5] and [6]. In 
[12], the authors consider a model of write-once memory 
(WOM). In particular, each memory cell can be in state either 

or 1. The state of a cell can go from to 1, but not from 

1 back to later These write-once bits are called wits. It is 
shown that, the efficiency of storing information in a WOM 
can be improved if one allows multiple rewrites and designs 
the storage/rewrite scheme carefully. 

Multilevel flash memory is a storage technology where 
the charge level of any cell can be easily increased, but is 
difficult to decrease. Recent multilevel cell technology allows 
many charge levels to be stored in a cell. Cells are organized 
into blocks that contain roughly 10^ cells. The only way 
to decrease the charge level of a cell is to erase the whole 
block (i.e., set the charge on all cells to zero) and reprogram 
each cell. This takes time, consumes energy, and reduces the 
lifetime of the memory. Therefore, it is important to design 
efficient rewriting schemes that maximize the number of 
rewrites between two erasures [7], [1], [2], [4]. The rewriting 
schemes increase some cell charge levels based on the current 
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cell state and message to be stored. In this paper, we call a 
rewriting scheme a modulation code. 

Two different objective functions for modulation codes are 
primarily considered in previous work: (i) maximizing the 
number of rewrites for the worst case [7], [1], [2] and (ii) 
maximizing for the average case [4]. As Finucane et al. [4] 
mentioned, the reason for considering average performance 
is the averaging effect caused by the large number of erasures 
during the Ufetime of a flash memory device. Our analysis 
shows that the worst-case objective and the average case 
objective are two extreme cases of our optimization objec- 
tive. We also discuss under what conditions each optimality 
measure makes sense. 

In previous work (e.g., [4], [1], [8], [2]), many modulation 
codes are shown to be asymptotically optimal as the number 
of cell-levels q goes to infinity. But the condition that q ^ oo 
can not be satisfied in practical systems. Therefore, we 
also analyze asymptotically optimal modulation codes when 
q is only moderately large using the results from load- 
balancing theory [13], [10], [11]. This suggests an enhanced 
algorithm that improves the performance of practical system 
significantly. Theoretical analysis and simulation results show 
that this algorithm performs better than other asymptotically 
optimal algorithms when q is moderately large. 

The structure of the paper is as follows. The system model 
and performance evaluation metrics are discussed in Section 
im An asymptotically optimal modulation code, which is 
universal over arbitrary i.i.d. input distributions, is proposed 
in Section |III] The storage efficiency of this asymptotically 
optimal modulation code is analyzed in Section |IV] An 
enhanced modulation code is also presented in Section |IV] 
The storage efficiency of the enhanced algorithm is also 
analyzed in Section |IV] Simulation results and comparisons 
are presented in Section |V] The paper is concluded in Section 

II. System Model 

A. System Description 

Flash memory devices usually rely on error detect- 
ing/correcting codes to ensure a low error rate. So far, 
practical systems tend to use Bose-Chaudhuri-Hocquenghem 
(BCH) and Reed-Solomon (RS) codes. The error-correcting 
codes (ECC's) are used as the outer codes while the modu- 
lation codes are the inner codes. In this paper, we focus on 
the modulation codes and ignore the noise and the design of 
ECC for now. 



Let us assume that a block contains n x N g-level cells 
and that n cells (called an n-cell) are used together to store 
k l-my variables (called a fc-variable). A block contains N 
?i-cells and the N fc-variables are assumed to be i.i.d. random 
variables. We assume that all the fc-variables are updated 
together randomly at the same time and the new values are 
stored in the corresponding n-cells. This is a reasonable 
assumption in a system with an outer ECC. We use the 
subscript t to denote the time index and each rewrite increases 
t by 1. When we discuss a modulation code, we focus on a 
single n-cell. (The encoder of the modulation code increases 
some of the cell-levels based on the current cell-levels and 
the new value of the /c-variable.) Remember that cell-levels 
can only be increased during a rewrite. So, when any cell- 
level must be increased beyond the maximum value g — 1, 
the whole block is erased and all the cell levels are reset to 
zero. We let the maximal allowable number of block-erasures 
be M and assume that after M block erasures, the device 
becomes unreliable. 

Assume the /c-variable written at time i is a random 
variable xt sampled from the set {0, 1,-- - j'' — 1} with 
distribution px{x). For convenience, we also represent the 
A:-variable at time t in the vector form as xt e Zf where 
Z/ denotes the set of integers modulo I. The cell-state vector 
at time t is denoted as st — {st{0), st{l), . . . , st{n — 1)) 
and st{i) G denotes the charge level of the i-th cell at 
time t. When we say Si >: Sj, we mean Si{m) > Sj{m) for 
m = 0, 1, , . . . , n — 1. Since the charge level of a cell can 
only be increased, continuous use of the memory implies that 
an erasure of the whole block will be required at some point. 
Although writes, reads and erasures can all introduce noise 
into the memory, we neglect this and assume that the writes, 
reads and erasures are noise-free. 

Consider writing information to a flash memory when 
encoder knows the previous cell state st_i, the current fc- 
variable Xt, and an encoding function f : x 7/^ — > ZJ^* that 
maps Xt and st-i to a new cell-state vector Sf. The decoder 
only knows the current cell state st and the decoding function 
5 : Z^ ^ Zf that maps the cell state Sf back to the variable 
vector Xt- Of course, the encoding and decoding functions 
could change over time to improve performance, but we 
only consider time-invariant encoding/decoding functions for 
simplicity. 

B. Performance Metrics 

1) Lifetime v.s. Storage Efficiency: The idea of designing 
efficient modulation codes jointly to store multiple variables 
in multiple cells was introduced by Jiang [7]. In previous 
work on modulation codes design for flash memory (e.g. [7], 
[1]7 [2], [4]), the lifetime of the memory (either worst-case or 
average) is maximized given fixed amount of information per 
rewrite. Improving storage density and extending the lifetime 
of the device are two conflicting objectives. One can either 
fix one and optimize the other or optimize over these two 
jointly. Most previous work (e.g., [4], [1], [8], [2]) takes the 
first approach by fixing the amount of information for each 
rewrite and maximizing the number of rewrites between two 



erasures. In this paper, we consider the latter approach and 
our objective is to maximize the total amount of information 
stored in the device until the device dies. This is equivalent 
to maximizing the average (over the fc-variable distribution 
px{x)) amount of information stored per cell-level, 

^ [niq-1))' 

where li is the amount of information stored at the z-th 
rewrite, R is the number of rewrites between two erasures, 
and the expectation is over the fc-variable distribution. We 
also call 7 as storage efficiency. 

2) Worst Case v.s. Average Case: In previous work on 
modulation codes for flash memory, the number of rewrites 
of an n-cell has been maximized in two different ways. The 
authors in [7], [1], [2] consider the worst case number of 
rewrites and the authors in [4] consider the average number 
of rewrites. As mentioned in [4], the reason for considering 
the average case is due to the large number of erasures in 
the Ufetime of a flash memory device. Interestingly, these 
two considerations can be seen as two extreme cases of the 
optimization objective in (01). 

Let the fc-variables be a sequence of i.i.d. random variables 
over time and all the n-cells. The objective of optimization 
is to maximize the amount of information stored until the 
device dies. The total amount of information stored in the 
devic^H can be upper-bounded by 

M 

=.^i?aog2(;") (1) 

where Ri is the number of rewrites between the {i — l)-th 
and the i-th erasures. Note that the upper bound in ([U is 
achievable by uniform input distribution, i.e., when the input 
fc-variable is uniformly distributed over Z;fc, each rewrite 
stores log2(Z'^) = fclog^/ bits of information. Due to the 
i.i.d. property of the input variables over time, i?i's are i.i.d. 
random variables over time. Since i?j's are i.i.d. over time, 
we can drop the subscript i. Since M, which is the maximum 
number of erasures allowed, is approximately on the order 
of 10^, by the law of large numbers (LLN), we have 

W K ME [i?]fclog2(0- 
Let the set of all valid encoder/decoder pairs be 

Q = {/, 9\st = f{st-i,xt), Xt = g{st), St h st-i} , 

where st >: st^i implies the charge levels are element-wise 
non-decreasing. This allows us to treat the problem 

max W, 
f.geQ 

as the following equivalent problem 

ma.x E\R\k\og^(l). (2) 

'There is a subtlety here. If the n-cell changes to the same value, should 
it count as stored information? Should this count as a rewrite? This formula 
assumes that it counts as a rewrite, so that l'' values (rather than l'' — 1) 
can be stored during each rewrite. 



Denote the maximal charge level of the i-th n-cell at time 
t as di{t). Note that time index t is reset to zero when a 
block erasure occurs and increased by one at each rewrite 
otherwise. Denote the maximal charge level in a block at 
time t as d{t), which can be calculated as d{t) = max^ di{t). 
Define ti as the time when the i-th n-cell reaches its maximal 
allowed value, i.e., ti = min{t\di{t) — q}. We assume, 
perhaps naively, that a block-erasure is required when any cell 
within a block reaches its maximum allowed value. The time 
when a block erasure is required is defined as T = min^ ti . It 
is easy to see that E [R] — NE [T] , where the expectations 
are over the /c-variable distribution. So maximizing E [T] is 
equivalent to maximizing W. So the optimization problem 
(|2]l can be written as the following optimization problem 



max E 
f.geQ 



min ti 

ie{l,2,--- ,N} 



(3) 



Under the assumption that the input is i.i.d. over all the n- 
cells and time indices, one finds that the t^'s are i.i.d. random 
variables. Let their common probability density function 
(pdf) be ft{x). It is easy to see that T is the minimum 
of N i.i.d. random variables with pdf ft{x). Therefore, 
we have frix) = N ft{x) {1 - Ft{x))^ ~^ , where Ft{x) 
is the cumulative distribution function (cdf) of ti. So, the 
optimization problem ([3]) becomes 

max E [T] = max / Nft{x) (1 - Ft(a;))^"^ xdx. (4) 
f,g^Q fg^QJ 

Note that when = 1, the optimization problem in (|4]l 
simplifies to 

max E \ti] . (5) 

This is essentially the case that the authors in [4] consider. 
When the whole block is used as one n-cell and the number 
of erasures allowed is large, optimizing the average (over all 
input sequences) number of rewrites of an n-cell is equivalent 
to maximizing the total amount of information stored W. 
The analysis also shows that the reason we consider average 
performance is not only due to the averaging effect caused by 
the large number of erasures. One other important assumption 
is that there is only one n-cell per block. 

The other extreme is when N ^ 1. In this case, the 
pdf N ft{x) {1 — Ft{x))^^^ tends to a point mass at the 
minimum of t and the integral J Nft{x) (1 — Ft{x))^^^ tdt 
approaches the minimum of t. This gives the worst case 
stopping time for the programming process of an n-cell. This 
case is considered by [7], [1], [2]. Our analysis shows that 
we should consider the worst case when ^ 1 even though 
the device experiences a large number of erasures. So the 
optimality measure is not determined only by M, but also by 
N. When N and M are large, it makes more sense to consider 
the worst case performance. When iV = 1, it is better to 
consider the average performance. When N is moderately 
large, we should maximize the number of rewrites using (|4]i 
which balances the worst case and the average case. 

When N is moderately large, one should probably focus 
on optimizing the function in (HI, but it is not clear how to 



do this directly. So, this remains an open problem for future 
research. Instead, we will consider a load-balancing approach 
to improve practical systems where q is moderately large. 

C. = 1 v.s. iV > 1 

If we assume that there is only one variable changed each 
time, the average amount of information per cell-level can be 
bounded by log2 kl because there are kl possible new values. 
Since the number of rewrites can be bounded by n{q — 1), 
we have 

7 < log2 kl. (6) 

If we allow arbitrary change on the fc-variables, there are 
totally l'^ possible new values. It can be shown that 



7 < klog^l. 



(7) 



For fixed / and q, the bound in (|7]i suggests using a large 
k can improve the storage efficiency. This is also the reason 
jointly coding over multiple cells can improve the storage 
efficiency [7]. Since optimal rewriting schemes only allow 
a single cell-level to increase by one during each rewrite, 
decodability implies that n > kl ~ 1 for the first case and 
n > l'^ ~1 for the second case. Therefore, the bounds in (|6]l 
and (|7]l also require large n to improve storage efficiency. 

The upper bound in dT) grows linearly with k while the 
upper bound in (|6]l grows logarithmically with k. Therefore, 
in the remainder of this paper, we assume an arbitrary change 
in the fc-variable per rewrite and A^ = 1, i.e., the whole block 
is used as an n-cell, to improve the storage efficiency. This 
approach implicitly trades instantaneous capacity for future 
storage capacity because more cells are used to store the same 
number of bits, but the cells can also be reused many more 
times. 

Note that the assumption of A^ = 1 might be difficult for 
real implementation, but its analysis gives an upper bound on 
the storage efficiency. From the analysis above with A^ = 1, 
we also know that maximizing 7 is equivalent to maximize 
the average number of rewrites. 

III. Self-randomized Modulation Codes 

In [4], modulation codes are proposed that are asymptoti- 
cally optimal (as q goes to infinity) in the average sense when 
fc = 2. In this section, we introduce a modulation code that 
is asymptotically optimal for arbitrary input distributions and 
arbitrary k and I. This rewriting algorithm can be seen as an 
extension of the one in [4]. The goal is, to increase the cell- 
levels uniformly on average for an arbitrary input distribution. 
Of course, decodability must be maintained. The solution 
is to use common information, known to both the encoder 
(to encode the input value) and the decoder (to ensure the 
decodability), to randomize the cell index over time for each 
particular input value. 

Let us assume the fc-variable is an i.i.d. random variable 
over time with arbitrary distribution px (x) and the fc-variable 
at time t is denoted as Xi G Z;^ . The output of the decoder 
is denoted as xt £ Z;fc. We choose n = l'^ and let the cell 
state vector at time t he st = (st(0), St(l), ■ • • , st{n — 1)), 
where st{i) G is the charge level of the i-th cell at time 



t. At t = 0, the variables are initialized to so ~ (0, . . . , 0), 
xq — and tq — 0. 

The decoding algorithm xt = g{st) is described as follows. 

• Step 1: Read cell state vector St and calculate the £i 
norm n = \\st\\i. 

• Step 2: Calculate = ^^t{i) and Xt = st — 
lii^ mod IK 

The encoding algorithm st = f{st-i,Xt) is described as 
follows. 

• Step 1: Read cell state St-i and calculate rj_i and Xt-i 
as above. If Xt-i = a:*, then do nothing. 

• Step 2: Calculate Axt — Xt — Xt-i mod l'^ and wt = 
Axt +rt-i + 1 mod l'^ 

• Step 3: Increase the charge level of the wt-th cell by 1. 

For convenience, in the rest of the paper, we refer the above 
rewriting algorithm as "self-randomized modulation code". 

Theorem 1: The self-randomized modulation code 
achieves at least n[q — q^^^) rewrites with high probability, 
as q — i- oo, for arbitrary fc, Z, and i.i.d. input distribution 
Px{x). Therefore, it is asymptotically optimal for random 
inputs as q ^ oo. 

Proof: [Sketch of Proof] The proof is similar to the 
proof in [4]. Since exactly one cell has its level increased by 
1 during each rewrite, is an integer sequence that increases 
by 1 at each rewrite. The cell index to be written Wt is 
randomized by adding the value {rt + 1) mod . This causes 
each consecutive sequence of rewrites to have a uniform 
affect on all cell levels. As g — *■ cxd, an unbounded number 
of rewrites is possible and we can assume t ^ oo. 

Consider the first nq~nq^^^ steps, the value ^ ; = + 
1) mod is as even as possible over {0, 1, • • • ,1^ — 1}. For 
convenience, we say there are {q—q^^^) at,k.i^ at each value, 
as the rounding difference by 1 is absorbed in the o(g) term. 
Assuming the input distribution is px — {po,Pi, • • • ,Pik-i}- 
For the case that at,k.i = *, the probability that wt — j is 
PU-z) mod i" foi" j G {0, 1, • • • , - !}■ Therefore, Wj has 
a uniform distribution over {0, 1, • • • , l'' — 1}. Since inputs 
are independent over time, by applying the same Chernoff 
bound argument as [4], it follows that the number of times 
Wt — j is at most q — 3 with high probability (larger than 
1 — l/poly((7)) for all j. Summing over j, we finish the proof. 

■ 

Remark 1: Notice that the randomizing term rt a deter- 
ministic term which makes wt look random over time in 
the sense that there are equally many terms for each value. 
Moreover, rt is known to both the encoder and the decoder 
such that the encoder can generate "uniform" cell indices 
over time and the decoder knows the accumulated value of rt, 
it can subtract it out and recover the data correctly. Although 
this algorithm is asymptotically optimal as (7 ^ cxo, the 
maximum number of rewrites n{q — o{q)) cannot be achieved 
for moderate q. This motivates the analysis and the design of 
an enhanced version of this algorithm for practical systems 
in next section. 

Remark 2: A self-randomized modulation code uses n = 
cells to store a fc-variable. This is much larger than the 



n — kl used by previous asymptotically optimal algorithms 
because we allow the /c-variable to change arbitrarily. Al- 
though this seems to be a waste of cells, the average amount 
of information stored per cell-level is actually maximized (see 
(|6]l and (|7|). In fact, the definition of asymptotic optimality 
requires n > ^'^ — 1 if we allow arbitrary changes to the 
/c-variable. 

Remark 3: We note that the optimality of the self- 
randomized modulation codes is similar to the weak robust 
codes presented in [9]. 

Remark 4: We use n = cells to store one of Z'' — 1 
possible messages. This is slightly worse than the simple 
method of using n = — 1. Is it possible to have self- 
randomization using only n — — \ cells? A preliminary 
analysis of this question based on group theory indicates 
that it is not. Thus, the extra cell provides the possibility 
to randomize the mappings between message values and the 
cell indices over time. 

IV. Load-balancing Modulation Codes 

While asymptotically optimal modulation codes (e.g., 
codes in [7], [1], [2], [4] and the self -randomized modulation 
codes described in Section Ulll i require q ^ co, practical 
systems use q values between 2 and 256. Compared to 
the number of cells n, the size of q is not quite large 
enough for asymptotic optimality to suffice. In other words, 
codes that are asymptotically optimal may have significantly 
suboptimal performance when the system parameters are 
not large enough. Moreover, different asymptotically optimal 
codes may perform differently when q is not large enough. 
Therefore, asymptotic optimality can be misleading in this 
case. In this section, we first analyze the storage efficiency of 
self-randomized modulation codes when q is not large enough 
and then propose an enhanced algorithm which improves the 
storage efficiency significantly. 

A. Analysis for Moderately Large q 

Before we analyze the storage efficiency of asymptotically 
optimal modulation codes for moderately large q, we first 
show the connection between rewriting process and the load- 
balancing problem (aka the balls-into-bins or balls-and-bins 
problem) which is well studied in mathematics and com- 
puter science [13], [10], [11]. Basically, the load-balancing 
problem considers how to distribute objects among a set of 
locations as evenly as possible. Specifically, the balls-and- 
bins model considers the following problem. If m balls are 
thrown into n bins, with each ball being placed into a bin 
chosen independently and uniformly at random, define the 
load as the number of balls in a bin, what is the maximal 
load over all the bins? Based on the results in Theorem 1 
in [11], we take a simpler and less accurate approach to the 
balls-into-bins problem and arrive at the following theorem. 

Theorem 2: Suppose that m balls are sequentially placed 
into n bins. Each time a bin is chosen independently and 
uniformly at random. The maximal load over all the bins is 
L and: 



(i) If m — din, the maximally loaded bin has L < 
T^jf^ balls, ci > 2 and di > 1, with high probabihty 
(1 — l/poly(n)) as n ^ cx). 

(m) If to = 71 Inn, the maximally loaded bin has L < 
'^\ninn ^alls, C4 > 1, with high probability (1 — l/poly(n)) 
as n — > oo. 

(iii) If TO = csn'^^, the maximally loaded bin has L < 
eczrfi"^^^ + ci\x\n, C2 > 1, C3 > 1 and di > 1, with high 
probability (1 — l/poly(n)) as n cx). 

Proof: Denote the event that there are at least k balls 
in a particular bin as Ek- Using the union bound over all 
subsets of size k, it is easy to show that the probability that 
Ek occurs is upper bounded by 



MEk} < 



Using Stirhng's formula, we have C^) < (^) . Then 
Pr{Eh} can be further bounded by 



(8) 



If TO = din, substitute k = i^lnn ^° °^ ®' 

have 



Pr{Ek} < 



die In Inn 



ci Inn 

g ( In In n f^l ^ l^L ^) ^1 111 *^))) 

<^ g( In In (lii(f^ic 111 n) — 111 111 n)^ 



1 



n 



ci-l ■ 



Denote the event that all bins have at most k balls as Ek- By 
applying the union bound, it is shown that 



Fr{Ek} > 1 



1 



1 



iCi-2 ■ 



71^1 n 
Since ci > 2, we finish the proof for the case of to = din 

^4 {In n) 
In In n 



If TO = 77 In 71, substitute fc = ''tl'""^ to the RHS of (O, 



we have 



Pr{^4 < 



e In In n 
C4 Inn 



< e 

< e 



1n in^^ ^ n— In C4 In n) 



1n in^^ ^ n— In In n) 



(-(C4-I)(lnn)2) ^ ^ 



By applying the union bound, we finish the proof for the case 
of TO — n In 77. 



ec^n 



If m = 



Pi{Ek} < 



c^n"^^, substitute k — ecsn'^^ ^ + C2lnn 
C2 In n to the RHS of (O, we have 

J 1 ^ C3en''2 ^1+C2 Inn 

ec3n"^ 



ec3n°2- 



C2 In n 



,((c5n'*2-i_|_c2 Inn) (in C5n''2-iln(c5n'*2- Inn))) 



< e 



(C5n'*2-1+C2 lnn)( - 



( — C2 In n) 



< e 



where C5 = eca. By applying the union bound, it is shown 
that 

Pr{Ek} < 1 - — = 1 

^ ' - 77=2 „C2-1 

Since C2 > 1, we finish the proof for the case of m = c^n'^^ . 

■ 

Remark 5: Note that Theorem |2] only shows an upper 
bound on the maximum load L with a simple proof. More 
precise results can be found in Theorem 1 of [11], where 
the exact order of L is given for different cases. It is worth 
mentioning that the results in Theorem 1 of [11] are different 
from Theorem 12] because Theorem 1 of [11] holds with 
probability 1 — o(l) while Theorem |2] holds with probability 
(1 - l/poly(n)). 

Remark 6: The asymptotic optimality in the rewriting 
process implies that each rewrite only increases the cell-level 
of a cell by 1 and all the cell-levels are fully used when 
an erasure occurs. This actually implies lim,„^oo = 1- 
Since n is usually a large number and q is not large enough in 
practice, the theorem shows that, when q is not large enough, 
asymptotic optimality is not achievable. For example, in 
practical systems, the number of cell-levels q does not depend 
on the number of cells in a block. Therefore, rather than 
n{q — 1), only roughly 71(17 ^ 1) charge levels can be 
used as n ^ 00 if q is a small constant which is independent 
of n. In practice, this loss could be mitigated by using writes 
that increase the charge level in multiple cells simultaneously 
(instead of erasing the block). 

Theorem 3: The self-randomized modulation code has 
storage efficiency 7 = cln when q — 1 — c and 
7 — |A;ln^ when g — 1 = clnrt as 71 goes to infinity with 
high probability (i.e., 1 — o(l)). 

Proof: Consider the problem of throwing to balls into n 
bins and let the r.v. AI be the number of balls thrown into n 
bins until some bin has more than q—l balls in it. While we 
would like to calculate i?[M] exactly, we still settle for an 
approximation based on the following result. If to = cn In n, 
then there is a constant d{c) such that maximum number of 
balls L in any bin satisfies 

(d(c) - 1) In 71 < L < d{c) In n 

with probability 1 — o(l) as n ^ 00 [11] . The constant d{c) 
is given by the largest x-root of 



a;(lnc — In a; + 1) + 1 



0, 



and solving this equation for c gives the implicit expression 
c = —d{c)W (^—e^^^'^^. Since the lower bound matches 

the expected maximum value better, we define 9 = d(c) — 
1 and apply it to our problem using the equation 9lnn = 
q — 1 01 9 — Therefore, the storage efficiency is 7 = 

In n 



n(g-l) 

If m 



cn, the maximum load is approximately 
with probability 1 — o(l) for large n [11]. By definition. 



9 - 1 = 



m In n 
"(■7-1) 



In 71 



Therefore, the storage efficiency is 7 = 



= clnis^ = cln 



feln( 



Remark 7: The results in Theorem [3] show that when q 
is on the order of O(lnn), the storage efficiency is on the 
order of Q{k\nl). Taking the limit as q,n 00 with q = 
O(lnn), we have lim = ^ > 0. When g is a constant 
independent of n, the storage efficiency is on the order of 
Q (in kin I). Taking the limit as n ^ 00 with q — 1 = c, 
we have lim = 0. In this regime, the self-randomized 
modulation codes actually perform very poorly even though 
they are asymptotically optimal as q ^ 00. 



B. Load-balancing Modulation Codes 

Considering the bins-and-balls problem, can we distribute 
balls more evenly when m/n is on the order of o{n)l 
Fortunately, when m — n, the maximal load can be reduced 
by a factor of roughly Q^i^",^yi by using the power of two 
random choices [10]. In detail, the strategy is, every time 
we pick two bins independently and uniformly at random 
and throw a ball into the less loaded bin. By doing this, 
the maximally loaded bin has roughly + 0(1) balls 



In 2 

with high probability. Theorem 1 in [13] gives the answer 
in a general form when we consider d random choices. The 
theorem shows there is a large gain when the number of 
random choice is increased from 1 to 2. Beyond that, the gain 
is on the same order and only the constant can be improved. 

Based on the idea of 2 random choices, we define the 
following load-balanced modulation code. 

Again, we let the cell state vector at time t he st ~ 
(si(0), St(l), • ■ ■ , st{n — 1)), where st{i) £ is the charge 
level of the i-th cell at time t. This time, we use n = 
ik+i ^gjjg store a fc-variable xt 6 Z;fc (i.e., we write 
(fc + 1) log2 I bits to store fclogj I bits of information). The 
information loss provides I ways to write the same value. 
This flexibility allows us to avoid sequences of writes that 
increase one cell level too much. We are primarily interested 
in binary variables with 2 random choices or / = 2. For the 
power of / choices to be effective, we must try to randomize 
(over time), the I possible choices over the set of all (") 
possibilities. The value = ||sf ||i is used to do this. Let H 
be the Galois field with l''^^ elements and h : Z;fc+i H 
be a bijection that satisfies /i(0) = (i.e., the Galois field 
element is associated with the integer 0). 

The decoding algorithm calculates xt from st and operates 
as follows: 

• Step 1: Read cell state vector st and calculate the ^l 
norm rt = ||sf||i. 



• Step 2: Calculate st — Yll=ii^t{i) and x'^ = 
Sf mod 1^+^. 

• Step 3: Calculate at — /i((rt mod -1)+ 1) and 
bt = h [rt mod l'^) 

. Step 4: Calculate Xt = h^^ (a^^^ (K^'t) - h)) mod l''. 
The encoding algorithm stores xt and operates as follows. 

• Step 1: Read cell state St-i and decode to x't_i and 
xt-i- If = xt^ then do nothing. 

• Step 2: Calculate rt — ||sf„i||i + 1, at = 
h (^{rt mod l'^ — l) -|- l), and bt = h [rt mod l^) 

. Step 3: Calculate xj'^ — h^^ (ath{xt + il'^) + bt) and 
Axi'^ = ^ - x't_^ mod 1^+^ for i = 0, 1, ... Z - 1. 

• Step 4: Calculat^ wt = &rgvninjizZi{st-i(l^x^P)}. 
Increase the charge level by 1 of cell Axj™'-*. 

Note that the state vector at t = is initialized to sq = 
(0, . . . , 0) and therefore xq — 0. The first arbitrary value that 
can be stored is xi. 

The following conjecture suggests that the ball-loading 
performance of the above algorithm is identical to the random 
loading algorithm with / = 2 random choices. 

Conjecture 1: If I ~ 2 and q — 1 = c\nn, then the load- 
balancing modulation code has storage efficiency 7 = fc with 
probability l-o(l) as n ^ 00. If q — 1 = c, the storage 
efficiency 7 — with probability l-o(l). 

Proof: [Sketch of Proof] Consider the affine permu- 
tation Tri"'''^ = h-\ah{x) + b) for a G H\{) and e iJ. 
As a, 6 vary, this permutation maps the two elements xt 
and Xt + uniformly over all pairs of cell indices. After 
m — n{n — 1) steps, we see that all pairs of a, 5 occur 
equally often. Therefore, by picking the less charged cell, the 
modulation code is almost identical to the random loading 
algorithm with two random choices. Unfortunately, we are 
interested in the case where m <C so the analysis is 
somewhat more delicate. If m = cn Inn, the highest charge 
level is clnn — 1 + '"^'"2" « clnn with probability 1 — o(l) 
[13]. Since g — 1 = clnn in this case, the storage efficiency 

cn In n loffn 2^ 

IS 7 = 

' nc in n 

and the maximum load is c — 1 + In Inn/ In 2 , „ . 
By definition, we have = g ~ 1. Therefore, we have 

_ cn log2 2'° _ cln 2 ^„ ^ 



fc. If m = cn, then q — 1 = c 

In In n 



7 



'(9-1) 



In hi 



Remark 8: If I ^ 2 and q is on the order of O(lnn), 
Conjecture [T] shows that the bound (|7]l is achievable by load- 
balancing modulation codes as n goes to infinity. In this 
regime, the load-balancing modulation codes provide a better 
constant than self-randomized modulation codes by using 
twice many cells. 

Remark 9: If I — 2 and g is a constant independent 
of n, the storage efficiency is 71 = c In for the self- 
randomized modulation code and 72 — j^'"^^ log2 § for the 
load-balancing modulation code. But, the self-randomized 
modulation code uses n — 2^ cells and the load-balancing 
modulation code uses n — 2^^^ cells. To make fair com- 
parison on the storage efficiency between them, we let n = 
2^+1 for both codes. Then we have 71 = cln^^Iaii and 

^Ties can be broken arbitrarily. 
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Fig. 1. Simulation results for random loading and algoiithms we proposed 
with k = 3, I = 2 and 1000 erasures. 

k=2, 1=2, n=2 and 1000 erasures 
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Fig. 2. Simulation results for random loading and codes in [4] with k = 2, 

1 = 2, n = 2 and 1000 erasures. 

72 = -r^r-^ IoEt 77 . So, as ri ^ oo, we see that ^ ^ 0. 
Therefore, the load-balancing modulation code outperforms 
the self-randomized code when n is sufficiently large. 

V. Simulation Results 

In this section, we present the simulation results for the 
modulation codes described in Sections |III] and IIV-BI In 
the figures, the first modulation code is called the "self- 
randomized modulation code" while the second is called the 
"load-balancing modulation code". Let the "loss factor" 77 
be the fraction of cell-levels which are not used when a 
block erasure is required: 77 = 1 — j^^^^- We show the 
loss factor for random loading with 1 and 2 random choices 
as comparison. Note that rj does not take the amount of 
information per cell-level into account. Results in Fig.[T]show 
that the self-randomized modulation code has the same ry with 
random loading with 1 random choice and the load-balancing 
modulation code has the same rj with random loading with 

2 random choices. This shows the optimality of these two 
modulation codes in terms of ball loading. 



n=16, 1.2 




Fig. 3. Storage efficiency of self-randomized modulation code and load- 
balancing modulation code with n = 16. 



n.2'° 1=2 




q 

Fig. 4. Storage efficiency of self-randomized modulation code and load- 
balancing modulation code with n = 2^". 



We also provide the simulation results for random loading 
with 1 random choice and the codes designed in [4], which 
we denote as FLM-(fc — 2,1 ~ 2,n = 2) algorithm, in 
Fig. 12] From results shown in Fig. |2l we see that the FLM- 
(k = 2,1 = 2,n = 2) algorithm has the same loss factor as 
random loading with 1 random choice. This can be actually 
seen from the proof of asymptotic optimality in [4] as the 
algorithm transforms an arbitrary input distribution into an 
uniform distribution on the cell-level increment. Note that 
FLM algorithm is only proved to be optimal when 1 bit of 
information is stored. So we just compare the FLM algorithm 
with random loading algorithm in this case. Fig. [3] and Fig. |4] 
show the storage efficiency 7 for these two modulation codes. 
Fig. [3] and Fig. |4] show that the load-balancing modulation 
code performs better than self-randomized modulation code 
when n is large. This is also shown by the theoretical analysis 
in Remark |9] 



VI. Conclusion 

In this paper, we consider modulation code design problem 
for practical flash memory storage systems. The storage 
efficiency, or average (over the distribution of input variables) 
amount of information per cell-level is maximized. Under 
this framework, we show the maximization of the number 
of rewrites for the the worst-case criterion [7], [1], [2] and 
the average-case criterion [4] are two extreme cases of our 
optimization objective. The self -randomized modulation code 
is proposed which is asymptotically optimal for arbitrary 
input distribution and arbitrary k and I, as the number of 
cell-levels q ^ oo. We further consider performance of 
practical systems where q is not large enough for asymptotic 
results to dominate. Then we analyze the storage efficiency 
of the self-randomized modulation code when q is only 
moderately large. Then the load-balancing modulation codes 
are proposed based on the power of two random choices 
[13] [10]. Analysis and numerical simulations show that 
the load-balancing scheme outperforms previously proposed 
algorithms. 
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